Overview

Dataset statistics

Number of variables20
Number of observations6611
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory4.0 MiB
Average record size in memory640.3 B

Variable types

Categorical12
Numeric8

Alerts

Source has constant value "AQS" Constant
POC has constant value "1" Constant
UNITS has constant value "ppm" Constant
AQS_PARAMETER_CODE has constant value "42101" Constant
AQS_PARAMETER_DESC has constant value "Carbon monoxide" Constant
STATE_CODE has constant value "36" Constant
STATE has constant value "New York" Constant
Date has a high cardinality: 731 distinct values High cardinality
Site ID is highly correlated with Daily Max 8-hour CO Concentration and 7 other fieldsHigh correlation
Daily Max 8-hour CO Concentration is highly correlated with Site ID and 7 other fieldsHigh correlation
DAILY_AQI_VALUE is highly correlated with Daily Max 8-hour CO ConcentrationHigh correlation
DAILY_OBS_COUNT is highly correlated with PERCENT_COMPLETEHigh correlation
PERCENT_COMPLETE is highly correlated with DAILY_OBS_COUNTHigh correlation
COUNTY_CODE is highly correlated with Site ID and 7 other fieldsHigh correlation
SITE_LATITUDE is highly correlated with Site ID and 7 other fieldsHigh correlation
SITE_LONGITUDE is highly correlated with Site ID and 6 other fieldsHigh correlation
Source is highly correlated with STATE and 9 other fieldsHigh correlation
POC is highly correlated with Source and 9 other fieldsHigh correlation
UNITS is highly correlated with Source and 9 other fieldsHigh correlation
Site Name is highly correlated with Site ID and 7 other fieldsHigh correlation
AQS_PARAMETER_CODE is highly correlated with Source and 9 other fieldsHigh correlation
AQS_PARAMETER_DESC is highly correlated with Source and 9 other fieldsHigh correlation
CBSA_CODE is highly correlated with Site ID and 7 other fieldsHigh correlation
CBSA_NAME is highly correlated with Site ID and 7 other fieldsHigh correlation
STATE_CODE is highly correlated with Source and 9 other fieldsHigh correlation
STATE is highly correlated with Source and 9 other fieldsHigh correlation
COUNTY is highly correlated with Site ID and 7 other fieldsHigh correlation
Date is uniformly distributed Uniform

Reproduction

Analysis started2024-05-07 19:17:58.282213
Analysis finished2024-05-07 19:18:09.705849
Duration11.42 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

Date
Categorical

HIGH CARDINALITY
UNIFORM

Distinct731
Distinct (%)11.1%
Missing0
Missing (%)0.0%
Memory size432.7 KiB
12/31/2021
 
10
08/05/2021
 
10
08/16/2021
 
10
08/15/2021
 
10
08/14/2021
 
10
Other values (726)
6561 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters66110
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row01/01/2020
2nd row01/02/2020
3rd row01/03/2020
4th row01/04/2020
5th row01/05/2020

Common Values

ValueCountFrequency (%)
12/31/202110
 
0.2%
08/05/202110
 
0.2%
08/16/202110
 
0.2%
08/15/202110
 
0.2%
08/14/202110
 
0.2%
08/13/202110
 
0.2%
08/12/202110
 
0.2%
08/11/202110
 
0.2%
08/10/202110
 
0.2%
08/09/202110
 
0.2%
Other values (721)6511
98.5%

Length

2024-05-07T15:18:09.946300image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
12/31/202110
 
0.2%
12/11/202110
 
0.2%
11/11/202110
 
0.2%
12/04/202110
 
0.2%
12/05/202110
 
0.2%
12/06/202110
 
0.2%
12/07/202110
 
0.2%
12/30/202110
 
0.2%
12/29/202110
 
0.2%
12/28/202110
 
0.2%
Other values (721)6511
98.5%

Most occurring characters

ValueCountFrequency (%)
017815
26.9%
217115
25.9%
/13222
20.0%
19182
13.9%
31544
 
2.3%
81232
 
1.9%
71220
 
1.8%
51214
 
1.8%
41200
 
1.8%
91200
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number52888
80.0%
Other Punctuation13222
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
017815
33.7%
217115
32.4%
19182
17.4%
31544
 
2.9%
81232
 
2.3%
71220
 
2.3%
51214
 
2.3%
41200
 
2.3%
91200
 
2.3%
61166
 
2.2%
Other Punctuation
ValueCountFrequency (%)
/13222
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common66110
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
017815
26.9%
217115
25.9%
/13222
20.0%
19182
13.9%
31544
 
2.3%
81232
 
1.9%
71220
 
1.8%
51214
 
1.8%
41200
 
1.8%
91200
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII66110
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
017815
26.9%
217115
25.9%
/13222
20.0%
19182
13.9%
31544
 
2.3%
81232
 
1.9%
71220
 
1.8%
51214
 
1.8%
41200
 
1.8%
91200
 
1.8%

Source
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size387.5 KiB
AQS
6611 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters19833
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAQS
2nd rowAQS
3rd rowAQS
4th rowAQS
5th rowAQS

Common Values

ValueCountFrequency (%)
AQS6611
100.0%

Length

2024-05-07T15:18:10.053463image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2024-05-07T15:18:10.176605image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
aqs6611
100.0%

Most occurring characters

ValueCountFrequency (%)
A6611
33.3%
Q6611
33.3%
S6611
33.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter19833
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A6611
33.3%
Q6611
33.3%
S6611
33.3%

Most occurring scripts

ValueCountFrequency (%)
Latin19833
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A6611
33.3%
Q6611
33.3%
S6611
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII19833
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A6611
33.3%
Q6611
33.3%
S6611
33.3%

Site ID
Real number (ℝ≥0)

HIGH CORRELATION

Distinct10
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean360571507.1
Minimum360050133
Maximum361030044
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size51.8 KiB
2024-05-07T15:18:10.261837image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum360050133
5-th percentile360050133
Q1360290023
median360551007
Q3360810125
95-th percentile361010003
Maximum361030044
Range979911
Interquartile range (IQR)520102

Descriptive statistics

Standard deviation295431.0967
Coefficient of variation (CV)0.0008193412148
Kurtosis-0.9307721959
Mean360571507.1
Median Absolute Deviation (MAD)259118
Skewness-0.1690481293
Sum2.383738234 × 1012
Variance8.727953289 × 1010
MonotonicityNot monotonic
2024-05-07T15:18:10.366808image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
360551007727
11.0%
360610135724
11.0%
360050133721
10.9%
360810124710
10.7%
360290023709
10.7%
360550015703
10.6%
361010003702
10.6%
360810125687
10.4%
360290005661
10.0%
361030044267
 
4.0%
ValueCountFrequency (%)
360050133721
10.9%
360290005661
10.0%
360290023709
10.7%
360550015703
10.6%
360551007727
11.0%
360610135724
11.0%
360810124710
10.7%
360810125687
10.4%
361010003702
10.6%
361030044267
 
4.0%
ValueCountFrequency (%)
361030044267
 
4.0%
361010003702
10.6%
360810125687
10.4%
360810124710
10.7%
360610135724
11.0%
360551007727
11.0%
360550015703
10.6%
360290023709
10.7%
360290005661
10.0%
360050133721
10.9%

POC
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size374.6 KiB
1
6611 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters6611
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
16611
100.0%

Length

2024-05-07T15:18:10.483311image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2024-05-07T15:18:10.610343image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
16611
100.0%

Most occurring characters

ValueCountFrequency (%)
16611
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number6611
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
16611
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common6611
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
16611
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII6611
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
16611
100.0%

Daily Max 8-hour CO Concentration
Real number (ℝ≥0)

HIGH CORRELATION

Distinct20
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2988201482
Minimum0
Maximum1.9
Zeros2
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size51.8 KiB
2024-05-07T15:18:10.716318image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.1
Q10.2
median0.3
Q30.3
95-th percentile0.6
Maximum1.9
Range1.9
Interquartile range (IQR)0.1

Descriptive statistics

Standard deviation0.1573854059
Coefficient of variation (CV)0.5266894043
Kurtosis12.96018086
Mean0.2988201482
Median Absolute Deviation (MAD)0.1
Skewness2.594986045
Sum1975.5
Variance0.02477016598
MonotonicityNot monotonic
2024-05-07T15:18:10.830564image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
0.22258
34.2%
0.32196
33.2%
0.4897
 
13.6%
0.1548
 
8.3%
0.5345
 
5.2%
0.6149
 
2.3%
0.775
 
1.1%
0.852
 
0.8%
0.934
 
0.5%
120
 
0.3%
Other values (10)37
 
0.6%
ValueCountFrequency (%)
02
 
< 0.1%
0.1548
 
8.3%
0.22258
34.2%
0.32196
33.2%
0.4897
 
13.6%
0.5345
 
5.2%
0.6149
 
2.3%
0.775
 
1.1%
0.852
 
0.8%
0.934
 
0.5%
ValueCountFrequency (%)
1.91
 
< 0.1%
1.82
 
< 0.1%
1.71
 
< 0.1%
1.62
 
< 0.1%
1.52
 
< 0.1%
1.43
 
< 0.1%
1.32
 
< 0.1%
1.25
 
0.1%
1.117
0.3%
120
0.3%

UNITS
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size387.5 KiB
ppm
6611 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters19833
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowppm
2nd rowppm
3rd rowppm
4th rowppm
5th rowppm

Common Values

ValueCountFrequency (%)
ppm6611
100.0%

Length

2024-05-07T15:18:10.972282image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2024-05-07T15:18:11.052306image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
ppm6611
100.0%

Most occurring characters

ValueCountFrequency (%)
p13222
66.7%
m6611
33.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter19833
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
p13222
66.7%
m6611
33.3%

Most occurring scripts

ValueCountFrequency (%)
Latin19833
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
p13222
66.7%
m6611
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII19833
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
p13222
66.7%
m6611
33.3%

DAILY_AQI_VALUE
Real number (ℝ≥0)

HIGH CORRELATION

Distinct20
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.236726668
Minimum0
Maximum22
Zeros2
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size51.8 KiB
2024-05-07T15:18:11.126739image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q33
95-th percentile7
Maximum22
Range22
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.944950511
Coefficient of variation (CV)0.60090045
Kurtosis9.35313006
Mean3.236726668
Median Absolute Deviation (MAD)1
Skewness2.277262232
Sum21398
Variance3.782832491
MonotonicityNot monotonic
2024-05-07T15:18:11.255548image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
22258
34.2%
32196
33.2%
5897
 
13.6%
1548
 
8.3%
6345
 
5.2%
7149
 
2.3%
875
 
1.1%
952
 
0.8%
1034
 
0.5%
1120
 
0.3%
Other values (10)37
 
0.6%
ValueCountFrequency (%)
02
 
< 0.1%
1548
 
8.3%
22258
34.2%
32196
33.2%
5897
 
13.6%
6345
 
5.2%
7149
 
2.3%
875
 
1.1%
952
 
0.8%
1034
 
0.5%
ValueCountFrequency (%)
221
 
< 0.1%
202
 
< 0.1%
191
 
< 0.1%
182
 
< 0.1%
172
 
< 0.1%
163
 
< 0.1%
152
 
< 0.1%
145
 
0.1%
1317
0.3%
1120
0.3%

Site Name
Categorical

HIGH CORRELATION

Distinct10
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size461.2 KiB
ROCHESTER 2
727 
CCNY
724 
PFIZER LAB SITE
721 
QUEENS COLLEGE 2
710 
Buffalo Near-Road
709 
Other values (5)
3020 

Length

Max length24
Median length16
Mean length14.42051127
Min length4

Characters and Unicode

Total characters95334
Distinct characters38
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPFIZER LAB SITE
2nd rowPFIZER LAB SITE
3rd rowPFIZER LAB SITE
4th rowPFIZER LAB SITE
5th rowPFIZER LAB SITE

Common Values

ValueCountFrequency (%)
ROCHESTER 2727
11.0%
CCNY724
11.0%
PFIZER LAB SITE721
10.9%
QUEENS COLLEGE 2710
10.7%
Buffalo Near-Road709
10.7%
Rochester Near-Road703
10.6%
PINNACLE STATE PARK702
10.6%
Queens College Near Road687
10.4%
BUFFALO661
10.0%
Flax Pond267
 
4.0%

Length

2024-05-07T15:18:11.359075image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2024-05-07T15:18:11.553945image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
21437
 
9.4%
rochester1430
 
9.3%
near-road1412
 
9.2%
queens1397
 
9.1%
college1397
 
9.1%
buffalo1370
 
8.9%
ccny724
 
4.7%
site721
 
4.7%
lab721
 
4.7%
pfizer721
 
4.7%
Other values (7)4014
26.2%

Most occurring characters

ValueCountFrequency (%)
8733
 
9.2%
E7140
 
7.5%
e6253
 
6.6%
R5679
 
6.0%
a5174
 
5.4%
N4937
 
5.2%
o4465
 
4.7%
C4274
 
4.5%
L3504
 
3.7%
A3488
 
3.7%
Other values (28)41687
43.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter52121
54.7%
Lowercase Letter31631
33.2%
Space Separator8733
 
9.2%
Decimal Number1437
 
1.5%
Dash Punctuation1412
 
1.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E7140
13.7%
R5679
10.9%
N4937
 
9.5%
C4274
 
8.2%
L3504
 
6.7%
A3488
 
6.7%
S2860
 
5.5%
T2852
 
5.5%
P2392
 
4.6%
F2310
 
4.4%
Other values (10)12685
24.3%
Lowercase Letter
ValueCountFrequency (%)
e6253
19.8%
a5174
16.4%
o4465
14.1%
r2802
8.9%
d2366
 
7.5%
l2350
 
7.4%
f1418
 
4.5%
u1396
 
4.4%
s1390
 
4.4%
n954
 
3.0%
Other values (5)3063
9.7%
Space Separator
ValueCountFrequency (%)
8733
100.0%
Decimal Number
ValueCountFrequency (%)
21437
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1412
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin83752
87.9%
Common11582
 
12.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
E7140
 
8.5%
e6253
 
7.5%
R5679
 
6.8%
a5174
 
6.2%
N4937
 
5.9%
o4465
 
5.3%
C4274
 
5.1%
L3504
 
4.2%
A3488
 
4.2%
S2860
 
3.4%
Other values (25)35978
43.0%
Common
ValueCountFrequency (%)
8733
75.4%
21437
 
12.4%
-1412
 
12.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII95334
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
8733
 
9.2%
E7140
 
7.5%
e6253
 
6.6%
R5679
 
6.0%
a5174
 
5.4%
N4937
 
5.2%
o4465
 
4.7%
C4274
 
4.5%
L3504
 
3.7%
A3488
 
3.7%
Other values (28)41687
43.7%

DAILY_OBS_COUNT
Real number (ℝ≥0)

HIGH CORRELATION

Distinct24
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.20450764
Minimum1
Maximum24
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size51.8 KiB
2024-05-07T15:18:11.722413image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile18
Q124
median24
Q324
95-th percentile24
Maximum24
Range23
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.788226427
Coefficient of variation (CV)0.1201588273
Kurtosis22.64665811
Mean23.20450764
Median Absolute Deviation (MAD)0
Skewness-4.407432971
Sum153405
Variance7.774206607
MonotonicityNot monotonic
2024-05-07T15:18:11.842370image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
245968
90.3%
18284
 
4.3%
1787
 
1.3%
1640
 
0.6%
1935
 
0.5%
1531
 
0.5%
320
 
0.3%
1317
 
0.3%
1416
 
0.2%
1214
 
0.2%
Other values (14)99
 
1.5%
ValueCountFrequency (%)
14
 
0.1%
24
 
0.1%
320
0.3%
49
0.1%
54
 
0.1%
69
0.1%
75
 
0.1%
89
0.1%
96
 
0.1%
1010
0.2%
ValueCountFrequency (%)
245968
90.3%
239
 
0.1%
224
 
0.1%
2113
 
0.2%
205
 
0.1%
1935
 
0.5%
18284
 
4.3%
1787
 
1.3%
1640
 
0.6%
1531
 
0.5%

PERCENT_COMPLETE
Real number (ℝ≥0)

HIGH CORRELATION

Distinct24
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean96.6933898
Minimum4
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size51.8 KiB
2024-05-07T15:18:11.984768image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum4
5-th percentile75
Q1100
median100
Q3100
95-th percentile100
Maximum100
Range96
Interquartile range (IQR)0

Descriptive statistics

Standard deviation11.59098391
Coefficient of variation (CV)0.1198735914
Kurtosis22.65218537
Mean96.6933898
Median Absolute Deviation (MAD)0
Skewness-4.407874457
Sum639240
Variance134.3509079
MonotonicityNot monotonic
2024-05-07T15:18:12.122364image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
1005968
90.3%
75284
 
4.3%
7187
 
1.3%
6740
 
0.6%
7935
 
0.5%
6331
 
0.5%
1320
 
0.3%
5417
 
0.3%
5816
 
0.2%
5014
 
0.2%
Other values (14)99
 
1.5%
ValueCountFrequency (%)
44
 
0.1%
84
 
0.1%
1320
0.3%
179
0.1%
214
 
0.1%
259
0.1%
295
 
0.1%
339
0.1%
386
 
0.1%
4210
0.2%
ValueCountFrequency (%)
1005968
90.3%
969
 
0.1%
924
 
0.1%
8813
 
0.2%
835
 
0.1%
7935
 
0.5%
75284
 
4.3%
7187
 
1.3%
6740
 
0.6%
6331
 
0.5%

AQS_PARAMETER_CODE
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size400.4 KiB
42101
6611 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters33055
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row42101
2nd row42101
3rd row42101
4th row42101
5th row42101

Common Values

ValueCountFrequency (%)
421016611
100.0%

Length

2024-05-07T15:18:12.253073image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2024-05-07T15:18:12.356946image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
421016611
100.0%

Most occurring characters

ValueCountFrequency (%)
113222
40.0%
46611
20.0%
26611
20.0%
06611
20.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number33055
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
113222
40.0%
46611
20.0%
26611
20.0%
06611
20.0%

Most occurring scripts

ValueCountFrequency (%)
Common33055
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
113222
40.0%
46611
20.0%
26611
20.0%
06611
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII33055
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
113222
40.0%
46611
20.0%
26611
20.0%
06611
20.0%

AQS_PARAMETER_DESC
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size465.0 KiB
Carbon monoxide
6611 

Length

Max length15
Median length15
Mean length15
Min length15

Characters and Unicode

Total characters99165
Distinct characters12
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCarbon monoxide
2nd rowCarbon monoxide
3rd rowCarbon monoxide
4th rowCarbon monoxide
5th rowCarbon monoxide

Common Values

ValueCountFrequency (%)
Carbon monoxide6611
100.0%

Length

2024-05-07T15:18:12.432837image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2024-05-07T15:18:12.572502image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
carbon6611
50.0%
monoxide6611
50.0%

Most occurring characters

ValueCountFrequency (%)
o19833
20.0%
n13222
13.3%
C6611
 
6.7%
a6611
 
6.7%
r6611
 
6.7%
b6611
 
6.7%
6611
 
6.7%
m6611
 
6.7%
x6611
 
6.7%
i6611
 
6.7%
Other values (2)13222
13.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter85943
86.7%
Uppercase Letter6611
 
6.7%
Space Separator6611
 
6.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o19833
23.1%
n13222
15.4%
a6611
 
7.7%
r6611
 
7.7%
b6611
 
7.7%
m6611
 
7.7%
x6611
 
7.7%
i6611
 
7.7%
d6611
 
7.7%
e6611
 
7.7%
Uppercase Letter
ValueCountFrequency (%)
C6611
100.0%
Space Separator
ValueCountFrequency (%)
6611
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin92554
93.3%
Common6611
 
6.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
o19833
21.4%
n13222
14.3%
C6611
 
7.1%
a6611
 
7.1%
r6611
 
7.1%
b6611
 
7.1%
m6611
 
7.1%
x6611
 
7.1%
i6611
 
7.1%
d6611
 
7.1%
Common
ValueCountFrequency (%)
6611
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII99165
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o19833
20.0%
n13222
13.3%
C6611
 
6.7%
a6611
 
6.7%
r6611
 
6.7%
b6611
 
6.7%
6611
 
6.7%
m6611
 
6.7%
x6611
 
6.7%
i6611
 
6.7%
Other values (2)13222
13.3%

CBSA_CODE
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size400.4 KiB
35620
3109 
40380
1430 
15380
1370 
18500
702 

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters33055
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row35620
2nd row35620
3rd row35620
4th row35620
5th row35620

Common Values

ValueCountFrequency (%)
356203109
47.0%
403801430
21.6%
153801370
20.7%
18500702
 
10.6%

Length

2024-05-07T15:18:12.642824image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2024-05-07T15:18:12.758203image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
356203109
47.0%
403801430
21.6%
153801370
20.7%
18500702
 
10.6%

Most occurring characters

ValueCountFrequency (%)
08743
26.4%
35909
17.9%
55181
15.7%
83502
10.6%
63109
 
9.4%
23109
 
9.4%
12072
 
6.3%
41430
 
4.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number33055
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
08743
26.4%
35909
17.9%
55181
15.7%
83502
10.6%
63109
 
9.4%
23109
 
9.4%
12072
 
6.3%
41430
 
4.3%

Most occurring scripts

ValueCountFrequency (%)
Common33055
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
08743
26.4%
35909
17.9%
55181
15.7%
83502
10.6%
63109
 
9.4%
23109
 
9.4%
12072
 
6.3%
41430
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII33055
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
08743
26.4%
35909
17.9%
55181
15.7%
83502
10.6%
63109
 
9.4%
23109
 
9.4%
12072
 
6.3%
41430
 
4.3%

CBSA_NAME
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size555.7 KiB
New York-Newark-Jersey City, NY-NJ-PA
3109 
Rochester, NY
1430 
Buffalo-Cheektowaga-Niagara Falls, NY
1370 
Corning, NY
702 

Length

Max length37
Median length37
Mean length29.04779912
Min length11

Characters and Unicode

Total characters192035
Distinct characters29
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNew York-Newark-Jersey City, NY-NJ-PA
2nd rowNew York-Newark-Jersey City, NY-NJ-PA
3rd rowNew York-Newark-Jersey City, NY-NJ-PA
4th rowNew York-Newark-Jersey City, NY-NJ-PA
5th rowNew York-Newark-Jersey City, NY-NJ-PA

Common Values

ValueCountFrequency (%)
New York-Newark-Jersey City, NY-NJ-PA3109
47.0%
Rochester, NY1430
21.6%
Buffalo-Cheektowaga-Niagara Falls, NY1370
20.7%
Corning, NY702
 
10.6%

Length

2024-05-07T15:18:12.846194image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2024-05-07T15:18:12.943343image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
ny3502
16.8%
new3109
14.9%
york-newark-jersey3109
14.9%
city3109
14.9%
ny-nj-pa3109
14.9%
rochester1430
6.9%
buffalo-cheektowaga-niagara1370
 
6.6%
falls1370
 
6.6%
corning702
 
3.4%

Most occurring characters

ValueCountFrequency (%)
e18036
 
9.4%
N17308
 
9.0%
-15176
 
7.9%
14199
 
7.4%
r12829
 
6.7%
a12699
 
6.6%
Y9720
 
5.1%
o7981
 
4.2%
w7588
 
4.0%
k7588
 
4.0%
Other values (19)68911
35.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter107234
55.8%
Uppercase Letter48815
25.4%
Dash Punctuation15176
 
7.9%
Space Separator14199
 
7.4%
Other Punctuation6611
 
3.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e18036
16.8%
r12829
12.0%
a12699
11.8%
o7981
7.4%
w7588
7.1%
k7588
7.1%
y6218
 
5.8%
t5909
 
5.5%
s5909
 
5.5%
i5181
 
4.8%
Other values (7)17296
16.1%
Uppercase Letter
ValueCountFrequency (%)
N17308
35.5%
Y9720
19.9%
J6218
 
12.7%
C5181
 
10.6%
P3109
 
6.4%
A3109
 
6.4%
R1430
 
2.9%
B1370
 
2.8%
F1370
 
2.8%
Dash Punctuation
ValueCountFrequency (%)
-15176
100.0%
Space Separator
ValueCountFrequency (%)
14199
100.0%
Other Punctuation
ValueCountFrequency (%)
,6611
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin156049
81.3%
Common35986
 
18.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e18036
 
11.6%
N17308
 
11.1%
r12829
 
8.2%
a12699
 
8.1%
Y9720
 
6.2%
o7981
 
5.1%
w7588
 
4.9%
k7588
 
4.9%
J6218
 
4.0%
y6218
 
4.0%
Other values (16)49864
32.0%
Common
ValueCountFrequency (%)
-15176
42.2%
14199
39.5%
,6611
18.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII192035
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e18036
 
9.4%
N17308
 
9.0%
-15176
 
7.9%
14199
 
7.4%
r12829
 
6.7%
a12699
 
6.6%
Y9720
 
5.1%
o7981
 
4.2%
w7588
 
4.0%
k7588
 
4.0%
Other values (19)68911
35.9%

STATE_CODE
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size381.0 KiB
36
6611 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters13222
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row36
2nd row36
3rd row36
4th row36
5th row36

Common Values

ValueCountFrequency (%)
366611
100.0%

Length

2024-05-07T15:18:13.109933image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2024-05-07T15:18:13.244474image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
366611
100.0%

Most occurring characters

ValueCountFrequency (%)
36611
50.0%
66611
50.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number13222
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
36611
50.0%
66611
50.0%

Most occurring scripts

ValueCountFrequency (%)
Common13222
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
36611
50.0%
66611
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII13222
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
36611
50.0%
66611
50.0%

STATE
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size419.8 KiB
New York
6611 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters52888
Distinct characters8
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNew York
2nd rowNew York
3rd rowNew York
4th rowNew York
5th rowNew York

Common Values

ValueCountFrequency (%)
New York6611
100.0%

Length

2024-05-07T15:18:13.317351image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2024-05-07T15:18:13.426034image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
new6611
50.0%
york6611
50.0%

Most occurring characters

ValueCountFrequency (%)
N6611
12.5%
e6611
12.5%
w6611
12.5%
6611
12.5%
Y6611
12.5%
o6611
12.5%
r6611
12.5%
k6611
12.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter33055
62.5%
Uppercase Letter13222
 
25.0%
Space Separator6611
 
12.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e6611
20.0%
w6611
20.0%
o6611
20.0%
r6611
20.0%
k6611
20.0%
Uppercase Letter
ValueCountFrequency (%)
N6611
50.0%
Y6611
50.0%
Space Separator
ValueCountFrequency (%)
6611
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin46277
87.5%
Common6611
 
12.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
N6611
14.3%
e6611
14.3%
w6611
14.3%
Y6611
14.3%
o6611
14.3%
r6611
14.3%
k6611
14.3%
Common
ValueCountFrequency (%)
6611
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII52888
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N6611
12.5%
e6611
12.5%
w6611
12.5%
6611
12.5%
Y6611
12.5%
o6611
12.5%
r6611
12.5%
k6611
12.5%

COUNTY_CODE
Real number (ℝ≥0)

HIGH CORRELATION

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean57.13341401
Minimum5
Maximum103
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size51.8 KiB
2024-05-07T15:18:13.508499image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum5
5-th percentile5
Q129
median55
Q381
95-th percentile101
Maximum103
Range98
Interquartile range (IQR)52

Descriptive statistics

Standard deviation29.54410618
Coefficient of variation (CV)0.5171073127
Kurtosis-0.9306234125
Mean57.13341401
Median Absolute Deviation (MAD)26
Skewness-0.1679934489
Sum377709
Variance872.8542101
MonotonicityNot monotonic
2024-05-07T15:18:13.639538image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
551430
21.6%
811397
21.1%
291370
20.7%
61724
11.0%
5721
10.9%
101702
10.6%
103267
 
4.0%
ValueCountFrequency (%)
5721
10.9%
291370
20.7%
551430
21.6%
61724
11.0%
811397
21.1%
101702
10.6%
103267
 
4.0%
ValueCountFrequency (%)
103267
 
4.0%
101702
10.6%
811397
21.1%
61724
11.0%
551430
21.6%
291370
20.7%
5721
10.9%

COUNTY
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size405.8 KiB
Monroe
1430 
Queens
1397 
Erie
1370 
New York
724 
Bronx
721 
Other values (2)
969 

Length

Max length8
Median length7
Mean length5.84208138
Min length4

Characters and Unicode

Total characters38622
Distinct characters22
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBronx
2nd rowBronx
3rd rowBronx
4th rowBronx
5th rowBronx

Common Values

ValueCountFrequency (%)
Monroe1430
21.6%
Queens1397
21.1%
Erie1370
20.7%
New York724
11.0%
Bronx721
10.9%
Steuben702
10.6%
Suffolk267
 
4.0%

Length

2024-05-07T15:18:13.762295image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2024-05-07T15:18:13.915492image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
monroe1430
19.5%
queens1397
19.0%
erie1370
18.7%
new724
9.9%
york724
9.9%
bronx721
9.8%
steuben702
9.6%
suffolk267
 
3.6%

Most occurring characters

ValueCountFrequency (%)
e7722
20.0%
o4572
11.8%
n4250
11.0%
r4245
11.0%
u2366
 
6.1%
M1430
 
3.7%
Q1397
 
3.6%
s1397
 
3.6%
E1370
 
3.5%
i1370
 
3.5%
Other values (12)8503
22.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter30563
79.1%
Uppercase Letter7335
 
19.0%
Space Separator724
 
1.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e7722
25.3%
o4572
15.0%
n4250
13.9%
r4245
13.9%
u2366
 
7.7%
s1397
 
4.6%
i1370
 
4.5%
k991
 
3.2%
w724
 
2.4%
x721
 
2.4%
Other values (4)2205
 
7.2%
Uppercase Letter
ValueCountFrequency (%)
M1430
19.5%
Q1397
19.0%
E1370
18.7%
S969
13.2%
Y724
9.9%
N724
9.9%
B721
9.8%
Space Separator
ValueCountFrequency (%)
724
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin37898
98.1%
Common724
 
1.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e7722
20.4%
o4572
12.1%
n4250
11.2%
r4245
11.2%
u2366
 
6.2%
M1430
 
3.8%
Q1397
 
3.7%
s1397
 
3.7%
E1370
 
3.6%
i1370
 
3.6%
Other values (11)7779
20.5%
Common
ValueCountFrequency (%)
724
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII38622
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e7722
20.0%
o4572
11.8%
n4250
11.0%
r4245
11.0%
u2366
 
6.1%
M1430
 
3.7%
Q1397
 
3.6%
s1397
 
3.6%
E1370
 
3.5%
i1370
 
3.5%
Other values (12)8503
22.0%

SITE_LATITUDE
Real number (ℝ≥0)

HIGH CORRELATION

Distinct10
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean41.88254144
Minimum40.73614
Maximum43.14618
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size51.8 KiB
2024-05-07T15:18:14.092881image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum40.73614
5-th percentile40.73614
Q140.81976
median42.09142
Q342.92110728
95-th percentile43.14618
Maximum43.14618
Range2.41004
Interquartile range (IQR)2.10134728

Descriptive statistics

Standard deviation1.054355422
Coefficient of variation (CV)0.02517410324
Kurtosis-1.850515277
Mean41.88254144
Median Absolute Deviation (MAD)1.05476
Skewness0.05222111795
Sum276885.4814
Variance1.111665356
MonotonicityNot monotonic
2024-05-07T15:18:14.167181image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
43.14618727
11.0%
40.81976724
11.0%
40.8679721
10.9%
40.73614710
10.7%
42.92110728709
10.7%
43.14501268703
10.6%
42.09142702
10.6%
40.739264687
10.4%
42.87690667661
10.0%
40.961017267
 
4.0%
ValueCountFrequency (%)
40.73614710
10.7%
40.739264687
10.4%
40.81976724
11.0%
40.8679721
10.9%
40.961017267
 
4.0%
42.09142702
10.6%
42.87690667661
10.0%
42.92110728709
10.7%
43.14501268703
10.6%
43.14618727
11.0%
ValueCountFrequency (%)
43.14618727
11.0%
43.14501268703
10.6%
42.92110728709
10.7%
42.87690667661
10.0%
42.09142702
10.6%
40.961017267
 
4.0%
40.8679721
10.9%
40.81976724
11.0%
40.739264687
10.4%
40.73614710
10.7%

SITE_LONGITUDE
Real number (ℝ)

HIGH CORRELATION

Distinct10
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-76.00944233
Minimum-78.80952601
Maximum-73.139046
Zeros0
Zeros (%)0.0%
Negative6611
Negative (%)100.0%
Memory size51.8 KiB
2024-05-07T15:18:14.272386image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-78.80952601
5-th percentile-78.80952601
Q1-77.55728026
median-77.20978
Q3-73.82153
95-th percentile-73.817694
Maximum-73.139046
Range5.670480012
Interquartile range (IQR)3.73575026

Descriptive statistics

Standard deviation2.138765962
Coefficient of variation (CV)-0.02813816147
Kurtosis-1.776028237
Mean-76.00944233
Median Absolute Deviation (MAD)1.599746012
Skewness-0.02358159539
Sum-502498.4233
Variance4.574319839
MonotonicityNot monotonic
2024-05-07T15:18:14.397381image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
-77.54817727
11.0%
-73.94825724
11.0%
-73.87809721
10.9%
-73.82153710
10.7%
-78.76582533709
10.7%
-77.55728026703
10.6%
-77.20978702
10.6%
-73.817694687
10.4%
-78.80952601661
10.0%
-73.139046267
 
4.0%
ValueCountFrequency (%)
-78.80952601661
10.0%
-78.76582533709
10.7%
-77.55728026703
10.6%
-77.54817727
11.0%
-77.20978702
10.6%
-73.94825724
11.0%
-73.87809721
10.9%
-73.82153710
10.7%
-73.817694687
10.4%
-73.139046267
 
4.0%
ValueCountFrequency (%)
-73.139046267
 
4.0%
-73.817694687
10.4%
-73.82153710
10.7%
-73.87809721
10.9%
-73.94825724
11.0%
-77.20978702
10.6%
-77.54817727
11.0%
-77.55728026703
10.6%
-78.76582533709
10.7%
-78.80952601661
10.0%

Interactions

2024-05-07T15:18:07.964038image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:00.680721image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:01.920721image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:03.028256image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:03.904735image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:05.001065image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:06.049651image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:07.007242image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:08.083487image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:01.163097image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:02.042365image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:03.126040image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:04.039103image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:05.119158image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:06.147162image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:07.137290image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:08.220513image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:01.265777image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:02.198112image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:03.244880image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:04.159055image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:05.268072image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:06.273786image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:07.246097image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:08.358844image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:01.366256image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:02.308688image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:03.363050image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:04.306083image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:05.384961image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:06.414078image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:07.352437image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:08.490874image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:01.515152image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:02.400596image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:03.458066image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:04.425560image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:05.549727image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:06.537282image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:07.456101image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:08.624508image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:01.629344image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:02.570226image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:03.603821image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:04.561065image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:05.660594image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:06.654813image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:07.575167image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:08.764111image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:01.724710image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:02.715058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:03.690792image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:04.668603image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:05.802569image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:06.772251image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:07.711999image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:08.897920image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:01.825729image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:02.887778image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:03.792289image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:04.886787image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:05.934856image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:06.856000image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2024-05-07T15:18:07.844461image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2024-05-07T15:18:14.526526image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.
2024-05-07T15:18:14.779062image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2024-05-07T15:18:15.231163image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2024-05-07T15:18:15.485129image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2024-05-07T15:18:15.706952image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2024-05-07T15:18:15.870603image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2024-05-07T15:18:09.163732image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-07T15:18:09.528629image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

DateSourceSite IDPOCDaily Max 8-hour CO ConcentrationUNITSDAILY_AQI_VALUESite NameDAILY_OBS_COUNTPERCENT_COMPLETEAQS_PARAMETER_CODEAQS_PARAMETER_DESCCBSA_CODECBSA_NAMESTATE_CODESTATECOUNTY_CODECOUNTYSITE_LATITUDESITE_LONGITUDE
001/01/2020AQS36005013310.2ppm2PFIZER LAB SITE1979.042101Carbon monoxide35620New York-Newark-Jersey City, NY-NJ-PA36New York5Bronx40.8679-73.87809
101/02/2020AQS36005013310.5ppm6PFIZER LAB SITE24100.042101Carbon monoxide35620New York-Newark-Jersey City, NY-NJ-PA36New York5Bronx40.8679-73.87809
201/03/2020AQS36005013310.6ppm7PFIZER LAB SITE24100.042101Carbon monoxide35620New York-Newark-Jersey City, NY-NJ-PA36New York5Bronx40.8679-73.87809
301/04/2020AQS36005013310.7ppm8PFIZER LAB SITE24100.042101Carbon monoxide35620New York-Newark-Jersey City, NY-NJ-PA36New York5Bronx40.8679-73.87809
401/05/2020AQS36005013310.3ppm3PFIZER LAB SITE24100.042101Carbon monoxide35620New York-Newark-Jersey City, NY-NJ-PA36New York5Bronx40.8679-73.87809
501/06/2020AQS36005013310.3ppm3PFIZER LAB SITE24100.042101Carbon monoxide35620New York-Newark-Jersey City, NY-NJ-PA36New York5Bronx40.8679-73.87809
601/07/2020AQS36005013310.4ppm5PFIZER LAB SITE24100.042101Carbon monoxide35620New York-Newark-Jersey City, NY-NJ-PA36New York5Bronx40.8679-73.87809
701/08/2020AQS36005013310.3ppm3PFIZER LAB SITE24100.042101Carbon monoxide35620New York-Newark-Jersey City, NY-NJ-PA36New York5Bronx40.8679-73.87809
801/09/2020AQS36005013310.3ppm3PFIZER LAB SITE24100.042101Carbon monoxide35620New York-Newark-Jersey City, NY-NJ-PA36New York5Bronx40.8679-73.87809
901/10/2020AQS36005013310.5ppm6PFIZER LAB SITE24100.042101Carbon monoxide35620New York-Newark-Jersey City, NY-NJ-PA36New York5Bronx40.8679-73.87809

Last rows

DateSourceSite IDPOCDaily Max 8-hour CO ConcentrationUNITSDAILY_AQI_VALUESite NameDAILY_OBS_COUNTPERCENT_COMPLETEAQS_PARAMETER_CODEAQS_PARAMETER_DESCCBSA_CODECBSA_NAMESTATE_CODESTATECOUNTY_CODECOUNTYSITE_LATITUDESITE_LONGITUDE
660112/22/2021AQS36103004410.3ppm3Flax Pond24100.042101Carbon monoxide35620New York-Newark-Jersey City, NY-NJ-PA36New York103Suffolk40.961017-73.139046
660212/23/2021AQS36103004410.2ppm2Flax Pond24100.042101Carbon monoxide35620New York-Newark-Jersey City, NY-NJ-PA36New York103Suffolk40.961017-73.139046
660312/24/2021AQS36103004410.2ppm2Flax Pond24100.042101Carbon monoxide35620New York-Newark-Jersey City, NY-NJ-PA36New York103Suffolk40.961017-73.139046
660412/25/2021AQS36103004410.4ppm5Flax Pond24100.042101Carbon monoxide35620New York-Newark-Jersey City, NY-NJ-PA36New York103Suffolk40.961017-73.139046
660512/26/2021AQS36103004410.4ppm5Flax Pond24100.042101Carbon monoxide35620New York-Newark-Jersey City, NY-NJ-PA36New York103Suffolk40.961017-73.139046
660612/27/2021AQS36103004410.2ppm2Flax Pond24100.042101Carbon monoxide35620New York-Newark-Jersey City, NY-NJ-PA36New York103Suffolk40.961017-73.139046
660712/28/2021AQS36103004410.2ppm2Flax Pond24100.042101Carbon monoxide35620New York-Newark-Jersey City, NY-NJ-PA36New York103Suffolk40.961017-73.139046
660812/29/2021AQS36103004410.2ppm2Flax Pond24100.042101Carbon monoxide35620New York-Newark-Jersey City, NY-NJ-PA36New York103Suffolk40.961017-73.139046
660912/30/2021AQS36103004410.3ppm3Flax Pond24100.042101Carbon monoxide35620New York-Newark-Jersey City, NY-NJ-PA36New York103Suffolk40.961017-73.139046
661012/31/2021AQS36103004410.3ppm3Flax Pond24100.042101Carbon monoxide35620New York-Newark-Jersey City, NY-NJ-PA36New York103Suffolk40.961017-73.139046